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Abstract 

People’s  personal  social  networks  are  big  and  cluttered,  and  currently  there  is  no 
good  way  to  automatically  organize  them.  Social  networking  sites  allow  users  to  man¬ 
ually  categorize  their  friends  into  social  circles  (e.g.  ‘circles’  on  Google+,  and  ‘lists’  on 
Facebook  and  Twitter),  however  they  are  laborious  to  construct  and  must  be  updated 
whenever  a  user’s  network  grows.  In  this  paper,  we  study  the  novel  task  of  auto¬ 
matically  identifying  users’  social  circles.  We  pose  this  task  as  a  multi-membership 
node  clustering  problem  on  a  user’s  ego-network,  a  network  of  connections  between  her 
friends.  We  develop  a  model  for  detecting  circles  that  combines  network  structure  as 
well  as  user  profile  information.  For  each  circle  we  learn  its  members  and  the  circle- 
specific  user  profile  similarity  metric.  Modeling  node  membership  to  multiple  circles 
allows  us  to  detect  overlapping  as  well  as  hierarchically  nested  circles.  Experiments 
show  that  our  model  accurately  identifies  circles  on  a  diverse  set  of  data  from  Facebook, 
Google+,  and  Twitter,  for  all  of  which  we  obtain  hand-labeled  ground-truth. 


1  Introduction 


Online  social  networks  allow  us  to  follow  streams  of  posts  generated  by  hundreds  of  our 
friends  and  acquaintances.  The  people  we  follow  generate  overwhelming  volumes  of  infor¬ 
mation  and  to  cope  with  the  ‘information  overload’  we  need  to  organize  our  personal  social 


networks  (Agarwal  et  ah,  2008.  Chen  and  Karger  2006  El-Arini  et  ah,  2009).  One  of  the 


main  mechanisms  for  users  of  social  networking  sites  to  organize  their  networks  and  the 
content  generated  by  them  is  to  categorize  their  friends  into  what  we  refer  to  as  social  cir¬ 
cles.  Practically  all  major  social  networks  provide  such  functionality,  for  example,  ‘circles’ 
on  Google+,  and  ‘lists’  on  Facebook  and  Twitter.  Once  a  user  creates  her  circles,  they  can 
be  used  for  content  filtering,  for  privacy,  and  for  sharing  groups  of  users  that  others  may 
wish  to  follow. 

Examples  of  circles  from  a  user’s  personal  social  network  are  shown  in  Figure  [lj  The 
‘owner’  of  such  a  network  (the  ‘ego’)  may  form  circles  based  on  common  bonds  and  at¬ 
tributes  between  themselves  and  the  users  whom  they  follow.  In  this  example,  the  ego 
may  wish  to  share  their  latest  TKDD  article  only  with  their  friends  from  the  computer 
science  department,  while  their  baby  photos  should  be  shared  only  with  their  immediate 
family;  similarly,  they  may  wish  to  limit  the  amount  of  content  generated  by  their  high- 
school  friends.  These  are  precisely  the  types  of  functionality  that  circles  are  intended  to 
facilitate. 
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friends  under  the  same  advisor 


Figure  1:  An  ego-network  with  labeled  circles.  The  central  user,  the  ‘ego’,  is  friends  with 
all  other  users  (the  ‘alters’)  in  the  network.  Alters  may  belong  to  any  number  of  circles, 
including  none.  We  aim  to  discover  circle  memberships  and  to  find  common  properties 
around  which  circles  form.  This  network  shows  typical  behavior  that  we  observe  in  our  data: 
Approximately  25%  of  our  ground-truth  circles  (from  Facebook)  are  contained  completely 
within  another  circle,  50%  overlap  with  another  circle,  and  25%  of  the  circles  have  no 
members  in  common  with  any  other  circle. 


Currently,  users  in  Facebook,  Google-f-  and  Twitter  identify  their  circles  either  man¬ 
ually,  or  in  a  naive  fashion  by  identifying  friends  sharing  a  common  feature.  Neither 
approach  is  particularly  satisfactory:  the  former  is  time  consuming  and  does  not  update 
automatically  as  a  user  adds  more  friends,  while  the  latter  fails  to  capture  individual  as¬ 
pects  of  users’  communities,  and  may  function  poorly  when  profile  information  is  missing 
or  withheld. 

In  this  paper  we  study  the  problem  of  automatically  discovering  users’  social  circles. 
In  particular,  given  a  single  user  with  her  personal  social  network,  our  goal  is  to  identify 
her  circles,  each  of  which  is  a  subset  of  her  friends. 

Circles  are  user-specific  as  each  user  organizes  her  personal  network  of  friends  indepen¬ 
dently  of  all  other  users  to  whom  she  is  not  connected.  This  means  that  we  can  formulate 
the  problem  of  circle  detection  as  a  clustering  problem  on  her  ego-network,  the  network  of 
friendships  between  her  friends.  In  practice,  circles  may  overlap  (a  circle  of  friends  from 
the  same  hometown  may  overlap  with  a  circle  from  the  same  college),  or  be  hierarchically 
nested  (among  friends  from  the  same  college  there  may  be  a  denser  circle  from  the  same 
degree  program).  We  design  our  model  with  both  types  of  behavior  in  mind. 

In  Figure  [l]  we  are  given  a  single  user  u  and  we  form  a  network  between  her  friends  v;t. 
We  refer  to  the  user  u  as  the  ego  and  to  the  nodes  Vi  as  alters.  The  task  is  then  to  identify 
the  circles  to  which  each  alter  Vi  belongs,  as  in  Figure  [l]  In  other  words,  the  goal  is  to  find 
communities/clusters  in  u' s  ego- network. 

Generally,  there  are  two  useful  sources  of  data  that  help  with  this  task.  The  first  is  the 
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set  of  edges  of  the  ego-network.  We  expect  that  circles  are  formed  by  densely-connected 
sets  of  alters  (Newman,  2006).  However,  different  circles  overlap  heavily,  i.e.,  alters  belong 


to  multiple  circles  simultaneously  (Ahn  et  al.,  2010  Palla  et  al.,  2005),  and  many  circles 


are  hierarchically  nested  in  larger  ones  (as  in  Figure  |lj).  Thus  it  is  important  to  model  an 
alter’s  memberships  to  multiple  circles.  Secondly,  we  expect  that  each  circle  is  not  only 


densely  connected  but  that  its  members  also  share  common  properties  or  traits  (Mislove 


et  ah,  2010).  Thus  we  need  to  explicitly  model  the  different  dimensions  of  user  profiles 


along  which  each  circle  emerges. 

We  model  circle  affiliations  as  latent  variables,  and  similarity  between  alters  as  a  func¬ 
tion  of  common  profile  information.  We  propose  an  unsupervised  method  to  learn  which 
dimensions  of  profile  similarity  lead  to  densely  linked  circles.  After  developing  a  model 
for  this  problem,  we  then  study  the  related  problems  of  updating  a  user’s  circles  once  new 
friends  are  added  to  the  network,  and  using  weak  supervision  from  the  user  in  the  form 
of  ‘seed  nodes’  to  improve  classification.  For  the  former  problem,  we  show  that  given  an 
already-defined  set  of  a  users’  circles,  we  can  accurately  predict  to  which  circles  a  new  user 
should  be  assigned.  For  the  latter  problem,  we  show  that  classification  accuracy  improves 
for  each  seed  node  that  a  user  provides,  though  substantial  improvements  in  accuracy  are 
already  obtained  even  with  2-3  seeds. 


Our  model  has  two  innovations:  First,  in  contrast  to  mixed-membership  models  (Airoldi 


et  al.  2008)  we  predict  hard  assignment  of  a  node  to  multiple  circles,  which  proves  critical 


for  good  performance  (Gregory,  2010b).  Second,  by  proposing  a  parameterized  definition 


of  profile  similarity,  we  learn  the  dimensions  of  similarity  along  which  links  emerge  (Feld 


198lj^Simmel  1964).  This  extends  the  notion  of  homophily  (Lazarsfeld  and  Merton,  1954 


McPherson  et  al.  2001)  by  allowing  different  circles  to  form  along  different  social  dimen¬ 
sions,  an  idea  related  to  the  concept  of  Blau  spaces  (McPherson,  1983).  We  achieve  this 


by  allowing  each  circle  to  have  a  different  definition  of  profile  similarity,  so  that  one  circle 
might  form  around  friends  from  the  same  school,  and  another  around  friends  from  the 
same  location.  We  learn  the  model  by  simultaneously  choosing  node  circle  memberships 
and  profile  similarity  functions  so  as  to  best  explain  the  observed  data. 

We  introduce  a  dataset  of  1,143  ego-networks  from  Facebook,  Google+,  and  Twitter, 
for  which  we  obtain  hand-labeled  ground-truth  from  5,636  circles.  Experimental  results 
show  that  by  simultaneously  considering  social  network  structure  as  well  as  user  profile 
information  our  method  performs  significantly  better  than  natural  alternatives  and  the 
current  state-of-the-art.  Besides  being  more  accurate  our  method  also  allows  us  to  generate 
automatic  explanations  of  why  certain  nodes  belong  to  common  communities.  Our  method 
is  completely  unsupervised,  and  is  able  to  automatically  determine  both  the  number  of 
circles  as  well  as  the  circles  themselves.  We  show  that  the  same  model  can  be  adapted  to 
deal  with  weak  supervision,  and  to  update  already-complete  circles  as  new  users  arrive. 

A  preliminary  version  of  this  article  appeared  in  McAuley  and  Leskovec  (2012). 


1.1  Further  Related  Work 

Although  a  ‘circle’  is  not  precisely  the  same  as  a  ‘community’,  our  work  broadly  falls  un- 


der  the  umbrella  of  community  detection 

( Lancichinetti  and  Fortunato 

2009a 

Schaeffer 

2007[ 

Leskovec  et  al. 

2010, 

Porter  et  al., 

2009 

Newman, 

2004).  While 

‘classical’  cluster- 
ithors  have  made 

ing  algorithms  assume  disjoint  communities  (1 

Schaeffer, 

2007 

),  many  ai 
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the  observation  that  communities 

in  real-world  networks  may  overlap  (Lancichinetti  and 

Fortunato 

2009b  Gregory, 

2010a 

Lancichinetti  et  al. 

2009 

Yang  and  Leskovec 

2012 

),  or 

have  hierarchical  structure 

(Ravasz  and  Barabasi 

2003). 

Topic- modeling  techniques  have  been  used  to  uncover  ‘mixed- memberships’  of  nodes 
to  multiple  groups,  and  extensions  allow  entities  to  be  attributed  with  text  information. 


Airoldi  et  al.  (2008)  modeled  node  attributes  as  latent  variables  drawn  from  a  Dirichlet 


distribution,  so  that  each  attribute  can  be  thought  of  as  a  partial  membership  to  a  com¬ 
munity.  Other  authors  extended  this  idea  to  allow  for  side-information  associated  with  the 
nodes  and  edges  ( Balasubramanyan  and  Cohen  2011  Chang  and  Blei[  2009  Liu  et  al. 
2009).  A  related  line  of  work  by  Hoff  et  al.  (2002)  also  used  latent  node  attributes  to 


model  edge  formation  between  ‘similar’  users,  which  they  adapted  to  clustering  problems 


m 


Handcock  et  al. 

(2007b 

)  and 

Krivitsky  et  al. 

(2009) 

Classical  clustering  algorithms  tend  to  identify  communities  based  on  node  features 
( Johnson ,  1967 )  or  graph  structure  (|Ahn  et  al.[  2010  Palla  et  al. ,  2005 ) ,  but  rarely  use  both 
in  concert.  Our  work  is  related  to  Yoshida  (2010)  in  the  sense  that  it  performs  clustering 


on  social-network  data,  and  Frank  et  al.  (2012),  which  models  memberships  to  multiple 
communities.  Another  work  closely  related  to  ours  is  Yang  and  Leskovec  (2012),  which 
explicitly  models  hard  memberships  of  nodes  to  multiple  overlapping  communities,  though 
it  does  so  purely  based  on  network  information  rather  than  node  features.  Our  inference 
procedure  is  also  similar  to  that  of  Hastings  (2006),  which  treats  nodes’  assignments  to 
communities  as  a  Maximum  a  Posteriori  inference  problem  between  a  set  of  interdependent 
variables. 

Finally,  Chang  et  al.  (2009);  Menon  and  Elkan  (2011,  2010)  and  Vu  et  al.  (2011)  model 
network  data  similar  to  ours;  like  our  own  work,  they  model  the  probability  that  two  nodes 
will  form  an  edge,  though  the  underlying  models  do  not  form  communities,  so  they  are  not 
immediately  applicable  to  the  problem  of  circle  detection. 

The  rest  of  this  paper  is  organized  as  follows.  We  propose  a  generative  model  for  the 
formation  of  edges  within  communities  in  Section  [2j  In  Section  [3]  we  derive  an  efficient 
model  parameter  learning  strategy.  In  Section  [4]  we  describe  extensions  to  our  model  that 
allow  it  to  be  used  in  semi-supervised  settings,  in  order  to  help  users  update  and  maintain 
their  circles.  We  describe  the  datasets  that  we  construct  in  Section [5]  We  give  two  schemes 
for  automatically  constructing  parameterized  user  similarity  function  from  profile  data  in 
Section  E  In  Section  [7]  we  show  how  to  scale  the  model  to  large  ego-networks.  Finally  in 
Section  [8]  we  describe  our  evaluation  and  experimental  results. 


2  A  Generative  Model  for  Friendships  in  Social  Circles 

We  desire  a  model  of  circle  formation  with  the  following  properties: 

1.  Nodes  within  circles  should  have  common  properties,  or  ‘aspects’. 

2.  Different  circles  should  be  formed  by  different  aspects,  e.g.  one  circle  might  be  formed 
by  family  members,  and  another  by  students  who  attended  the  same  university. 

3.  Circles  should  be  allowed  to  overlap,  and  ‘stronger’  circles  should  be  allowed  to  form 
within  ‘weaker’  ones,  e.g.  a  circle  of  friends  from  the  same  degree  program  may  form 
within  a  circle  from  the  same  university,  as  in  Figure  [lj 
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4.  We  would  like  to  leverage  both  profile  information  and  network  structure  in  order  to 
identify  circles. 

5.  Ideally  we  would  like  to  be  able  to  pinpoint  which  aspects  of  a  profile  caused  a  circle 
to  form,  so  that  the  model  is  interpretable  by  the  user. 

The  input  to  our  model  is  an  ego- network  G  =  (V,E),  along  with  ‘profiles’  for  each 
user  v  G  V.  The  ‘center’  node  u  of  the  ego-network  (the  ‘ego’)  is  not  included  in  G ,  but 
rather  G  consists  only  of  ids  friends  (the  ‘alters’).  We  define  the  ego-network  in  this  way 
precisely  because  creators  of  circles  do  not  themselves  appear  in  their  own  circles.  For  each 
ego-network,  our  goal  is  to  predict  a  set  of  circles  C  =  {C\  . . .  Ck},  Ck  C  V,  and  associated 
parameter  vectors  6 k  that  encode  how  each  circle  emerged.  We  encode  ‘user  profiles’  into 
pairwise  features  cj>(x,  y )  that  in  some  way  capture  what  properties  the  users  x  and  y  have  in 
common.  We  first  describe  our  model,  which  can  be  applied  using  arbitrary  feature  vectors 
4>(x,y),  and  in  Section  [6]  we  develop  several  ways  to  construct  feature  vectors  4>(x,y )  that 
are  suited  to  our  particular  application. 

We  describe  a  model  of  social  circles  that  treats  circle  memberships  as  latent  variables. 
Nodes  within  a  common  circle  are  given  an  opportunity  to  form  an  edge,  which  naturally 
leads  to  hierarchical  and  overlapping  circles.  We  will  then  devise  an  unsupervised  algorithm 
to  jointly  optimize  the  latent  variables  and  the  profile  similarity  parameters  so  as  to  best 
explain  the  observed  network  data. 

Our  model  of  social  circles  is  defined  as  follows.  Given  an  ego-network  G  and  a  set  of 
I\  circles  C  =  {C\ . . .  Ck},  we  model  the  probability  that  a  pair  of  nodes  (x,y)  €  V  x  V 
form  an  edge  as 

p((x,  y)  G  E)  oc  exp  E  (</>(x,y),0k)  -  ^  ak(<l>(x,y),6k)>.  (1) 

l  CkD{x,y}  Ck^{x,y}  J 

s - v - '  s - V - ' 

circles  containing  both  nodes  all  other  circles 

For  each  circle  Ck,  9k  is  the  profile  similarity  parameter  that  we  will  learn.  The  idea  is 
that  (cj)(x,  y),9k)  is  high  if  both  nodes  belong  to  Ck,  and  low  if  either  of  them  do  not.  The 
parameter  otk  trades-off  these  two  effects,  i.e. ,  it  trades-off  the  influence  of  edges  within  Ck 
compared  to  edges  outside  of  (or  crossing)  Ck-  Since  the  feature  vector  0(x,  y)  encodes  the 
similarity  between  the  profiles  of  two  users  x  and  y,  the  parameter  vector  9k  encodes  which 
dimensions  of  profile  similarity  caused  the  circle  to  form,  so  that  nodes  within  a  circle  Ck 
should  ‘look  similar’  according  to  9k-  Note  that  the  pair  (x,y)  should  be  treated  as  an 
unordered  pair  in  the  case  of  an  undirected  network  (e.g.  Facebook),  but  should  be  treated 
as  an  ordered  pair  for  directed  networks  (e.g.  Google-I-  and  Twitter). 

Considering  that  edges  e  =  (x,y)  are  generated  independently,  we  can  write  the  prob¬ 
ability  of  G  as 

P0(G;  C)  =  \\p{e^E)x\\  p(e  0  E),  (2) 

e£E  egE 

where  0  =  {(Ok,ctk)}k=1"'K  is  our  set  of  model  parameters.  Defining  the  shorthand  nota¬ 
tion 

4(e)  =  d(e  e  Ck)  -  ak5(e  £  Ck),  $(e)  =  ^  4(e)  (0(e),  9k) 

CkaC 
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allows  us  to  write  the  log-likelihood  of  G: 

Z0(G;C)  =  £>(e)-  Y  log  (l  +  e*(e))  ,  (3) 

e£E  eGVxV 

where  Z  =  (1  +  e$lel)  is  a  normalization  constant. 

Next,  we  describe  how  to  optimize  node  circle  memberships  C  as  well  as  the  parameters 
of  the  user  profile  similarity  functions  0  =  {{Ok,  «fc)}  (k  =  1 . . .  K)  given  a  graph  G  and 
user  profiles. 


3  Unsupervised  Learning  of  Model  Parameters 


Treating  circles  C  as  latent  variables,  we  aim  to  find  0  =  {6,  q}  so  as  to  maximize  the 
regularized  log-likelihood  of  (eq.  [5]),  i.e. , 


0,C  =  argrnax  Iq(G:C)  —  XQ(0). 

e,c 


We  solve  this  problem  using  coordinate  ascent  on  0  and  C  (MacKay,  2003): 

Cl  =  argmax  (G;  C) 
c 

0f+1  =  argmax  l@(G]  Cl)  —  Af2(0). 

e 


(4) 

(5) 

(6) 


We  optimize  (eq. [6])  using  L-BFGS,  a  standard  quasi-Newton  procedure  to  optimize  smooth 
functions  of  many  variables  (Nocedal,  1980).  Partial  derivatives  are  given  by 


dl 

d0k 

dl 

dak 


5(e)  _  OQ 

Y  -de(k)<l>(e)k  '  $(e)  +  Y  dk(e)</>(e)k  ~ 

eevxv  1  +  ev;  ^  ouk 

Y  Ke  i  Ck )  (</>(e),0k)  -  YS(e  i  Ck)  (He),  0k 


(7) 

(8) 


e€VxV 


eeE 


For  fixed  C  \  Ci  we  note  that  solving  argrna xc  Iq(G]C  \  Q)  can  be  expressed  as 
pseudo-boolean  optimization  in  a  pairwise  graphical  model  (Boros  and  Hammer,  2002). 
‘Pseudo-boolean  optimization’  refers  to  problems  defined  over  boolean  variables  (in  this 
case,  whether  or  not  a  node  is  assigned  to  a  particular  community),  where  the  variables 
being  optimized  are  interdependent  (in  this  case,  relationships  are  defined  over  edges  in  a 
graph).  In  short,  our  optimization  problem  can  be  written  in  the  form 


Ck  =  argmax  Y  E(x,y)H(x  £  C),5(y  <E  C)). 


(9) 


(x,y)£VxV 


Although  this  problem  class  is  NP-hard  in  general,  efficient  approximation  algorithms  are 
readily  available  (Rother  et  ah,  2007).  In  our  setting,  we  want  edges  with  high  weight 
(under  9k)  to  appear  in  Ck,  and  edges  with  low  weight  to  appear  outside  of  Ck.  Defining 

Ofc(e)  =  Y  dk{e)  {(j)(e),0k) 

CkeC\Ci 
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the  energy  E]e f  of  (eq.  [9|  is 

Ee  (0, 0)  =  -Eg  (0, 1)  =  (1, 0) 

El(  1,1) 


Okie)  -  ak  (4>{e),0k)  -  log(l  +  e0^~a^e^),  e  £  E 
-log(l  +  e°fc(e)-“fcWe)A));  e^E 

ok{e)  +  (</>(e),  0k)  -  log(l  +  e0*(e)+We)A>),  e  £  E 
-log(l  +  e°fc(e)+<<?i(e)’efe)),  e£E  ' 


By  expressing  the  problem  in  this  form  we  can  draw  upon  existing  work  on  pseudo-boolean 
optimization.  We  use  the  publicly- available  ‘QPBO’  software  described  in  Rother  et  al. 


(2007),  which  implements  algorithms  described  in  Hammer  et  al.  (1984)  and 


Kohli  and 


Torr  ( 2005 ) ,  and  is  able  to  accurately  approximate  problems  of  the  form  shown  in  (eq.  [9]) . 


Essentially,  problems  of  the  type  shown  in  (eq.  [9])  are  reduced  to  maximum  flow ,  where 
boolean  labels  for  each  node  are  recovered  from  their  assignments  to  ‘source’  and  ‘sink’  sets. 
Such  algorithms  have  worst-case  complexity  0(|1V|3),  though  the  average  case  running-time 
is  far  better  (Kolmogorov  and  RotherJ  2007).  We  solve  (eq.  [9])  for  each  Ck  in  a  random 
order. 

The  two  optimization  steps  of  (eq.  [5])  and  (eq.  [6])  are  repeated  until  convergence,  i.e., 
until  Ct+1  =  C*.  The  entire  procedure  is  presented  in  Algorithm  [l]  We  regularize  (eq.  [4]) 
using  the  i\  norm,  i.e., 

K  |0fc| 
fc=l  i= 1 

which  leads  to  sparse  (and  readily  interpretable)  parameters.  Our  algorithm  can  readily 
handle  all  but  the  largest  problem  sizes  typically  observed  in  ego-networks:  in  the  case  of 


Facebook,  the  average  ego-network  has  around  190  nodes  (Ugander  et  al.,  2011),  while  the 


largest  network  we  encountered  has  4,964  nodes.  Later,  in  Section  [7j  we  will  exploit  the 
fact  that  our  features  are  binary,  and  that  many  nodes  share  similar  features,  to  develop 
more  efficient  algorithms  based  on  Markov  Chain  Monte  Carlo  inference.  Note  that  since 
the  method  is  unsupervised,  inference  is  performed  independently  for  each  ego-network. 
This  means  that  our  method  could  be  run  on  the  full  Facebook  graph  (for  example),  as 
circles  are  independently  detected  for  each  user,  and  the  ego-networks  typically  contain 
only  hundreds  of  nodes.  In  Section  [4]  we  describe  extensions  that  allow  our  model  to  be 
used  in  semi-supervised  settings. 


3.1  Hyperparameter  Estimation 


To  choose  the  optimal  number  of  circles,  we  choose  K  so  as  to  minimize  an  approximation 
to  the  Bayesian  Information  Criterion  (BIC),  an  idea  seen  in  several  works  on  probabilistic 
clustering  (Airoldi  et  al.  2008  Handcock  et  al.  2007a  Volinsky  and  Raftery  2000).  In 


this  context,  the  Bayesian  Information  Criterion  is  defined  as 

BIC{K-Qk)  ~  -2 Iqk{G-,C)  +  |0A|log|F|, 


(10) 


where  0A  is  the  set  of  parameters  predicted  when  there  are  K  circles,  and  1 0 K  is  the 
number  of  parameters  (which  increases  linearly  as  K  increases).  We  then  choose  K  so  as 
to  minimize  this  objective: 


K  =  argrnin  BIC[K ;  @K ) . 

K 


(11) 
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ALGORITHM  1:  Predict  complete  circles  with  hyperparameters  A,  K. 

Data:  ego-network  G  =  (V,  E),  edge  features  4>(e)  :  E  — >•  RF ,  hyperparameters  A,  K 
Result:  parameters  0  :=  {(Ok,  dfc)}fc=1”  K ,  communities  C 
initialize  0°  £  {0, 1}F,  cc°  :=  1,  Ck  '■=  0,  t  :=  0; 

repeat 

for  k  £  { 1 ...  K}  do 

Cl  :=  argma xCE(x,y)evxVELy)(6(x  e  <?)><%  £  C)); 

//  using  QPBO,  see  (eq.  |9]) 

end 

0t+1  :=  argmaxe  /e(G;C‘)  -  AQ(0); 

//  using  L-BFGS,  see  (eqs.  [7]  and  [SJ) 
t  :=  t  +  1; 
until  Ct+1  =  Ct\ 


In  other  words,  an  additional  circle  will  only  be  added  to  the  model  if  doing  so  has  a 
‘significant’  impact  on  the  log-likelihood. 

The  regularization  parameter  A  £  {0, 1, 10, 100}  was  determined  using  leave-one-out 
cross  validation,  though  in  our  experience  did  not  significantly  impact  performance. 


4  Extensions 


So  far,  we  have  considered  the  ‘cold-start’  problem  of  predicting  complete  sets  of  circles 
using  nothing  but  node  attributes  and  edge  information.  In  other  words,  we  have  treated 
circle  prediction  as  an  unsupervised  task.  This  setting  is  realistic  if  users  construct  their 
circles  only  after  their  ego-networks  have  already  been  defined.  On  the  other  hand,  in 
settings  where  users  build  their  circles  incrementally,  it  is  less  likely  that  we  would  wish 
to  predict  complete  circles  ‘from  scratch’.  We  note  that  both  settings  occur  in  the  three 
social  networks  that  we  consider. 

In  this  section,  we  describe  techniques  to  exploit  partially  observed  circle  information 
to  help  users  update  and  maintain  their  circles.  In  other  words,  we  would  like  to  apply 


our  model  to  users’  personal  networks  as  they  change  and  evolve  (Backstrom  et  al.  2006) 


Since  our  model  is  probabilistic,  it  is  straightforward  to  adapt  it  to  make  use  of  partially 
observed  data,  by  conditioning  on  the  assignments  of  some  of  the  latent  variables  in  our 
model.  In  this  way,  we  adapt  our  model  for  semi-supervised  settings  in  which  a  user  labels 
some  or  all  of  the  members  of  their  circles.  Later,  in  Section  [7|  we  describe  modifications 
of  our  model  that  allow  it  to  be  applied  to  extremely  large  networks,  by  exploiting  the  fact 
that  many  users  assigned  to  common  circles  also  have  common  features. 


4.1  Circle  Maintenance 

First  we  deal  with  the  problem  of  a  user  adding  new  friends  to  an  established  ego-network, 
whose  circles  have  already  been  defined.  Thus,  given  a  complete  set  of  circles,  our  goal 
is  to  predict  community  memberships  for  a  new  node,  based  on  that  node’s  features,  and 
their  patterns  of  connectivity  to  existing  nodes  in  the  ego-network. 

Since  circles  in  this  setting  are  fully-observed,  we  simply  fit  the  model  parameters  that 


best  explain  the  ground-truth  circles  C  provided  by  the  user: 

0  =  argnrax/e(G;  C)  —  Afl(0).  (12) 

© 

As  with  (eq.  [6])  this  is  solved  using  L-BFGS,  though  optimization  is  significantly  faster  in 
this  case  as  there  are  no  longer  latent  community  memberships  to  infer,  and  thus  coordinate 
ascent  is  not  required. 

Next,  we  must  predict  to  which  of  the  K  ground-truth  circles  a  new  user  u  belongs. 
That  is,  we  must  predict  cu  E  {0, 1}A ,  where  each  is  a  binary  variable  indicating  whether 
the  user  u  should  belong  to  the  circle  Ck-  In  practice,  for  the  sake  of  evaluation,  we  shall 
suppress  a  single  user  from  G  and  C,  and  try  to  recover  their  memberships. 

This  can  be  done  by  choosing  the  assignment  cu  that  maximizes  the  log-likelihood  of 
C  once  u  is  added  to  the  graph.  We  define  the  augmented  community  memberships  as 
C+  =  {C£}k=1-K ,  where 


r,  i  f  C_k  u  {»},  cl  =  1 

*  1  Ck,  cj  =  0  ■ 

The  updated  community  memberships  are  then  chosen  according  to 

C+  =  argmaxC(G  U  {u}]C+) 

c“ 

The  above  expression  can  be  computed  efficiently  for  different  values  of  cu  by  noting  that 
the  log-likelihood  only  changes  for  terms  including  u,  meaning  that  we  need  to  compute 
p((x,y)  E  E )  only  if  x  =  u  or  y  =  u.  In  other  words,  we  only  need  to  consider  how  the 
new  user  relates  to  existing  users,  rather  than  considering  how  existing  users  relate  to  each 
other;  thus  computing  the  log-likelihood  requires  linear  (rather  than  quadratic)  time.  To 
find  the  optimal  cu  we  can  simply  enumerate  all  2A  possibilities,  which  is  feasible  so  long 
as  the  user  has  no  more  than  I\  ~  20  circles.  For  users  with  more  circles  we  must  resort 
to  an  iterative  update  scheme  as  we  did  in  Section  [3j 


(14) 


4.2  Semi-Supervised  Circle  Prediction 


Next,  we  consider  the  problem  of  using  weak  supervision  in  the  form  of  ‘seed  nodes’  to 
assist  in  circle  prediction  (Andersen  and  Lang,  2006).  In  this  setting,  the  user  manually 


labels  a  few  users  from  each  of  the  circles  they  want  to  create,  say  {si  . . .  sa'}-  Our  goal 
is  then  to  predict  K  circles  C  =  {C\  . . .  Ck}  subject  to  the  constraint  that  Sk  C  Ck  for  all 
k  E  {1...K}. 

Again,  since  our  model  is  probabilistic,  this  can  be  done  by  conditioning  on  the  as¬ 
signments  of  some  of  the  latent  variables.  That  is,  we  simply  optimize  /©(G;C)  subject 
to  the  constraint  that  Sk  C  Ck  for  all  k  E  {1 ...  K}.  In  the  parlance  of  graphical  models, 
this  means  that  rather  than  treating  the  seed  nodes  as  latent  variables  to  be  predicted,  we 
treat  them  as  evidence  on  which  we  condition.  We  could  also  include  negative  evidence 
(i.e.,  the  user  could  provide  labels  for  users  who  do  not  belong  to  each  circle),  or  we  could 
have  users  provide  additional  labels  interactively,  though  the  setting  described  is  the  most 
similar  to  what  is  used  in  practice. 
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fraction  of  overlap  with  most  similar  community 


Figure  2:  Histogram  of  overlap  between  circles  (on  Facebook).  A  value  of  zero  indicates 
that  the  circle  does  not  intersect  with  any  of  the  user’s  other  circles,  whereas  a  value  of  one 
indicates  that  a  circle  is  entirely  contained  within  another.  Approximately  25%  of  circles 
exhibit  the  latter  behavior. 


5  Dataset  Description 


Our  goal  is  to  evaluate  our  method  on  ground-truth  data.  We  expended  significant  time, 
effort,  and  resources  to  obtain  high  quality  hand-labeled  data,  which  we  have  made  available 
online]^]  We  were  able  to  obtain  ego-networks  and  ground-truth  from  three  major  social 
networking  sites:  Facebook,  Google+,  and  Twitter. 

From  Facebook  we  obtained  profile  and  network  data  from  10  ego-networks,  consisting 
of  193  circles  and  4,039  users.  To  obtain  circle  information  we  developed  our  own  Facebook 
application  and  conducted  a  survey  of  ten  users,  who  were  asked  to  manually  identify  all 
of  the  circles  to  which  their  friends  belonged.  It  took  each  user  between  2  and  3  hours 
to  label  their  entire  network.  On  average,  users  identified  19  circles  in  their  ego-networks, 
with  an  average  circle  size  of  22  friends.  Examples  of  circles  we  obtained  include  students 
of  common  universities  and  classes,  sports  teams,  relatives,  etc. 

Figure  [2]  shows  the  extent  to  which  our  193  user- labeled  circles  in  10  ego  networks  from 
Facebook  overlap  (intersect)  with  each  other.  Around  one  quarter  of  the  identified  circles 
are  independent  of  any  other  circle,  though  a  similar  fraction  are  completely  contained 
within  another  circle  (e.g.  friends  who  studied  under  the  same  adviser  may  be  a  subset 
of  friends  from  the  same  university).  The  remaining  50%  of  communities  overlap  to  some 
extent  with  another  circle. 

For  the  other  two  datasets  we  obtained  publicly  accessible  data.  From  Google-I-  we 
obtained  data  from  133  ego-networks,  consisting  of  479  circles  and  106,674  users.  The  133 
ego-networks  represent  all  133  Google+  users  who  had  shared  at  least  two  circles,  and 
whose  network  information  was  publicly  accessible  at  the  time  of  our  crawl.  The  Google-I- 
circles  are  quite  different  to  those  from  Facebook,  in  the  sense  that  their  creators  have 
chosen  to  release  them  publicly,  and  because  Google-I-  is  a  directed  network  (note  that  our 
model  can  very  naturally  be  applied  to  both  to  directed  and  undirected  networks).  For 
example,  one  circle  contains  candidates  from  the  2012  republican  primary,  who  presumably 
do  not  follow  their  followers,  nor  each  other.  Finally,  from  Twitter  we  obtained  data  from 
1,000  ego-networks,  consisting  of  4,869  circles  (or  ‘lists’  |Kim  et  ah,  2010:  Nasirifard  and 


'http : / /snap . stanford.edu/data/ 
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Hayes 

2011 

Wu  et  ah, 

2011 

Zhao 

2011}) 


and  81,362  users.  The  ego-networks  we  obtained 


range  in  size  from  10  to  4,964  nodes. 

Taken  together  our  data  contains  1,143  different  ego-networks,  5,541  circles,  and  192,075 
users.  The  size  differences  between  these  datasets  simply  reflects  the  availability  of  data 
from  each  of  the  three  sources.  Our  Facebook  data  is  fully  labeled ,  in  the  sense  that  we 
obtain  every  circle  that  a  user  considers  to  be  a  cohesive  community,  whereas  our  Google+ 
and  Twitter  data  is  only  partially  labeled,  in  the  sense  that  we  only  have  access  to  public 
circles.  We  design  our  evaluation  procedure  in  Section  [8]  so  that  partial  labels  cause  no 
issues. 


6  Constructing  Features  from  User  Profiles 


Profile  information  in  all  of  our  datasets  can  be  represented  as  a  tree  where  each  level 
encodes  increasingly  specific  information  (Figure [3]  left).  In  other  words,  user  profiles  are 
organized  into  increasingly  specific  categories.  For  example,  a  user’s  profile  might  have  a 
education  category,  which  would  be  further  separated  into  categories  such  as  name ,  location, 
and  type.  The  leaves  of  the  tree  are  then  specific  values  in  these  categories,  e.g.  Princeton, 
Cambridge,  and  Graduate  School.  Several  works  deal  with  automatically  building  features 


from  tree-structured  data  (Haussler,  1999  Vishwanathan  and  Smola,  2002),  but  in  order 


to  understand  the  relationship  between  circles  and  user  profile  information,  we  shall  design 
our  own  feature  representation  scheme. 

We  propose  two  hypotheses  for  how  users  organize  their  social  circles:  either  they  may 
form  circles  around  users  who  share  some  common  property  with  each  other,  or  they  may 
form  circles  around  users  who  share  some  common  property  with  themselves.  For  example, 
if  a  user  has  many  friends  who  attended  Stanford,  then  they  may  form  a  ‘Stanford’  circle. 
On  the  other  hand,  if  they  themselves  did  not  attend  Stanford,  they  may  not  consider 
attendance  to  Stanford  to  be  a  salient  feature.  The  feature  construction  schemes  we  propose 
allow  us  to  assess  which  of  these  hypotheses  better  represents  the  data  we  obtain. 

From  Google+  we  collect  data  from  six  categories  (gender,  last  name,  job  titles,  insti¬ 
tutions,  universities,  and  places  lived).  From  Facebook  we  collect  data  from  26  categories, 
including  users’  hometowns,  birthdays,  colleagues,  political  and  religious  affiliations,  etc. 
As  a  proxy  for  profile  data,  from  Twitter  we  collect  data  from  two  categories,  namely 
the  set  of  hashtags  and  mentions  used  by  each  user  during  two-weeks’  worth  of  tweets. 
‘Categories’  correspond  to  parents  of  leaf  nodes  in  a  profile  tree,  as  shown  in  Figure  [3j 

We  first  propose  a  difference  vector  to  encode  the  relationship  between  two  profiles. 
A  non-technical  description  is  given  in  Figure  [3}  Essentially,  we  want  to  encode  those 
dimensions  where  two  users  are  the  same  (e.g.  Alan  and  Dilly  went  to  the  same  graduate 
school),  and  those  where  they  are  different  (e.g.  they  do  not  have  the  same  surname). 
Suppose  that  users  v  G  V  each  have  an  associated  profile  tree  Tv,  and  that  l  G  Tv  is  a  leaf 
in  that  tree.  We  define  the  difference  vector  ax,y  between  two  users  x  and  y  as  a  binary 
indicator  encoding  the  profile  aspects  where  users  x  and  y  differ  (Figure [3j  bottom  left): 

CTX,y[l]  =  5({l  G  %)  +  (l  €  Ty)).  (15) 

Note  that  feature  descriptors  are  defined  per  ego-network:  while  many  thousands  of  high 
schools  (for  example)  exist  among  all  Facebook  users,  only  a  small  number  appear  among 
any  particular  user’s  friends. 
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Although  the  above  difference  vector  has  the  advantage  that  it  encodes  profile  infor¬ 
mation  at  a  fine  granularity,  it  has  the  disadvantage  that  it  is  high-dimensional  (up  to 
4,122  dimensions  in  the  data  we  considered).  One  way  to  address  this  is  to  form  difference 
vectors  based  on  the  parents  of  leaf  nodes:  this  way,  we  encode  what  profile  categories  two 
users  have  in  common,  but  disregard  specific  values  (Figure [3j  bottom  right).  For  example, 
we  encode  how  many  hashtags  two  users  tweeted  in  common,  but  discard  which  hashtags 
they  tweeted: 

<Te,j/[p]  ^2l^children(p)^xtyi^'  (16) 

This  scheme  has  the  advantage  that  it  requires  a  constant  number  of  dimensions,  regardless 
of  the  size  of  the  ego-network  (26  for  Facebook,  6  for  Google+,  2  for  Twitter,  as  described 
above) . 

Based  on  the  difference  vectors  aX:V  (and  cr'x  y)  we  now  describe  how  to  construct  edge 
features  4>(x,y).  The  first  property  we  wish  to  model  is  that  members  of  circles  should 
have  common  relationships  with  each  other: 

=  (l;-aXty).  (17) 

The  second  property  we  wish  to  model  is  that  members  of  circles  should  have  common 
relationships  to  the  ego  of  the  ego-network.  In  this  case,  we  consider  the  profile  tree  Tu 
from  the  ego  user  u.  We  then  define  our  features  in  terms  of  that  user: 

cf2(x,y)  =  {l-,-\aXtU~  (?y,u\)  (18) 

(ITt.u  —  ay,u\  is  taken  elementwise).  These  two  parameterizations  allow  us  to  assess  which 
mechanism  better  captures  users’  subjective  definition  of  a  circle.  In  both  cases,  we  include 
a  constant  feature  (‘1’),  which  controls  the  probability  that  edges  form  within  circles,  or 
equivalently  it  measures  the  extent  to  which  circles  are  made  up  of  friends.  Importantly, 
this  allows  us  to  predict  memberships  even  for  users  who  have  no  profile  information,  simply 
due  to  their  patterns  of  connectivity. 

Similarly,  for  the  ‘compressed’  difference  vector  a'xy ,  we  define 

^(x,y)  =  (1;-<4)W),  ip2{x,y)  =  (l-,-\a'XtU- a'yj).  (19) 

To  summarize,  we  have  identified  four  ways  of  representing  the  compatibility  between  dif¬ 
ferent  aspects  of  profiles  for  two  users.  We  considered  two  ways  of  constructing  a  difference 
vector  (ax,y  vs.  ax  y)  and  two  ways  of  capturing  the  compatibility  between  a  pair  of  profiles 
(4>(x,y)  vs.  if(x,y)).  The  features  are  designed  to  model  the  following  behavior: 

1.  Ego  users  build  circles  around  common  relationships  between  their  friends  (cj)1,  'if1) 

2.  Ego  users  build  circles  around  common  relationships  between  their  friends  and  them¬ 
selves  (cf2,  if2) 

In  our  experiments  we  assess  which  of  these  assumptions  is  more  realistic  in  practice. 
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7  Fast  Inference  in  Large  Ego-Networks 


Although  our  algorithm  is  able  to  handle  the  problem  sizes  typically  encountered  in  ego- 
networks  (i.e. ,  fewer  than  1,000  friends),  scalability  to  larger  networks  presents  an  issue, 
as  we  require  quadratic  memory  to  encode  the  compatibility  between  every  pair  of  nodes 
(an  issue  we  note  is  also  present  in  the  existing  approaches  we  consider  in  Section  [8]).  In 
this  section,  we  propose  a  more  scalable  alternative  that  makes  use  of  the  fact  that  many 
nodes  belonging  to  common  communities  also  share  common  features. 

Noting  that  features  4> 1  and  i ft2  described  in  Section  [6]  are  binary  valued,  as  are  com¬ 
munity  memberships,  if  there  are  K  communities  and  E-dimensional  features,  there  can 
be  at  most  2K+F  ‘types’  of  node.  In  other  words,  every  node’s  community  membership  is 
drawn  from  {0, 1}A,  and  every  node’s  feature  vector  is  drawn  from  {0, 1}^  ,  so  there  are 
at  most  2K+F  distinct  community/feature  combinations.  Of  course  the  number  of  distinct 
node  types  is  also  bounded  by  |Vj,  the  number  of  nodes  in  the  graph. 

In  practice,  however,  the  number  of  distinct  node  ‘types’  is  much  smaller,  as  nodes 
belonging  to  common  communities  tend  to  have  common  features.  Community  member¬ 
ships  are  also  not  independent:  in  Figure  [2]  we  observed  both  disjoint  and  hierarchically 
nested  communities,  which  means  that  of  the  2A  possible  community  memberships,  only 
a  fraction  of  them  occur  in  practice. 


nodes  that  have  common  features  and  community  memberships.  Note  that  the  adaptations 
to  be  described  can  be  applied  to  any  types  of  feature  (i.e.,  not  just  binary  features),  all 
we  require  is  that  many  users  share  the  same  features;  we  assume  binary  features  merely 
for  the  sake  of  presentation. 

We  start  by  representing  each  node  using  binary  strings  that  encode  both  its  community 
memberships  and  its  features.  Each  node’s  community  memberships  are  represented  using 
S  :  V  — >  T,K ,  such  that 


In  this  section,  we  propose  a  Markov-Chain  Monte  Carlo  (MCMC)  sampler  (Newman 


and  Barkema,  1999)  which  efficiently  updates  node-community  memberships  by  ‘collapsing’ 


S(x)[k] 


1,  if  x  E  Ck 
0,  otherwise 


(20) 


Similarly,  each  node’s  features  are  represented  using  the  binary  string  Q,  which,  since  our 
features  are  already  binary,  is  simply  the  concatenation  of  the  feature  dimensions. 

We  now  say  that  the  ‘type’  of  a  node  x  is  the  concatenation  of  its  community  string 
and  its  feature  string,  (S(x);  Q(x)),  and  we  build  a  (sparse)  table  types  :  SA  x  T,F  — >  N 
that  counts  how  many  nodes  exist  of  each  type. 

In  our  setting,  MCMC  consists  of  repeatedly  updating  the  (binary)  label  of  each  node 
in  a  particular  community.  Specifically,  if  the  marginal  (log)  probability  that  a  node  x 
belongs  to  a  community  k  is  given  by  £%.,  then  the  node’s  new  label  is  chosen  by  sampling 
z  4—  U(0, 1),  and  updating 


S(x)[k\ 


1,  if  z  <  exp  {^(4(1)  -4(0))} 
0,  otherwise 


(21) 


where  T  is  a  temperature  parameter  that  decreases  at  each  iteration,  so  that  we  are  more 
likely  to  choose  the  label  with  higher  probability  as  the  model  ‘cools’. 

Computing  t^(0)  and  ^(1)  (the  probability  that  node  x  takes  the  label  0  or  1  in 
community  k)  requires  computing  p((x,y )  E  E)  for  all  y  E  V.  However,  we  note  that  if 


13 


two  nodes  y  and  y'  have  the  same  type  (i.e.,  they  belong  to  the  same  communities  and 
have  the  same  features),  then  p((x,y )  G  E )  =  p((x,  y')  G  E).  In  order  to  maximize  the  log- 
likelihood  of  the  observed  data,  we  must  also  consider  whether  (x,  y)  and  (x,  y')  are  actually 
edges  in  the  graph.  To  do  so,  we  first  compute  4( 0)  and  4(1)  under  the  assumption  that 
no  edges  are  incident  on  x,  after  which  we  correct  for  those  edges  incident  on  x.  Thus  the 
running  time  of  a  single  update  is  linear  in  the  number  of  distinct  node  types,  plus  the 
average  node  degree,  both  of  which  are  bounded  by  the  number  of  nodes. 

The  entire  procedure  is  demonstrated  in  Algorithm  [2j 


ALGORITHM  2:  Update  memberships  node  x  and  circle  k. 

Data:  node  x  whose  membership  to  circle  Ck  is  to  be  updated 
Result:  updated  membership  for  node  x 
initialize  4(0)  :=  0,  4(1)  :=  0; 

construct  a  dummy  node  Xq  with  the  communities  and  features  of  x  but  with  x  ^  Ck', 
construct  a  dummy  node  X\  with  the  communities  and  features  of  x  but  with  x  G  Ck', 
for  (c, /)  €  dom (types)  do 

//  c=  community  string,  /=  feature  string 
n  :=  types (c,  /); 

//  n=  number  of  nodes  of  this  type 
if  S(x)  =  c  A  Q(x )  =  /  then 

//  avoid  including  a  self -loop  on  x 
n  :=  n  —  1 ; 

end 

construct  a  dummy  node  y  with  community  memberships  c  and  features  /; 

//  first  compute  probabilities  assuming  all  pairs  (x,y)  are  non-edges 

4(0)  :=  4(0)  +  nlogp((x0,y)  E)- 
4(4  :=  4(1)  +  ™1°g P{(xi,y)  i  E)-, 

end 

for  (x,  y)  G  E  do 

//  correct  for  edges  incident  on  x 

4(0)  :=  4(0)  -  logp((x0,  y)  £  E)  +  logp((x0,  y)  G  E); 

4(4  :=  4(4  logp((xi,  y')  £  E)  +  logp{(x1,y)  G  E); 

end 

//  update  membership  to  circle  k 

types (S(x),Q(x))  :=  types (S(x),Q(x))  -  1; 

*<-W(0,l); 

if  t  <  exp  {T(4(l)  -  4(0))}  then 
|  ^(x)^]  :=  1 

else 

|  5(x)[fc]  :=  0 

end 

types (S(x),Q(x))  '■=  types (S(x),Q(x))  +  1; 


We  also  exploit  the  same  observation  when  computing  partial  derivatives  of  the  log- 
likelihood,  that  is  we  first  efficiently  compute  derivatives  under  the  assumption  that  the 
graph  contains  no  edges,  and  then  correct  the  result  by  summing  over  all  edges  in  E. 


14 


8  Experiments 


We  first  describe  the  evaluation  metrics  to  be  used  in  Sections  18.11  and  8.2,  before  de¬ 


scribing  the  baselines  to  be  evaluated  in  Section  8.3  We  describe  the  performance  of  our 


(unsupervised)  algorithm  in  Section  8.4,  and  extensions  in  Sections  8.6,  8.7,  and  8.8 


8.1  Evaluation  metrics 

Although  our  method  is  unsupervised,  we  can  evaluate  it  on  ground-truth  data  by  ex¬ 
amining  the  maximum-likelihood  assignments  of  the  latent  circles  C  =  {C \  . . .  Ck}  after 
convergence.  Our  goal  is  that  for  a  properly  regularized  model,  the  latent  circles  will  align 
closely  with  the  human  labeled  ground-truth  circles  C  =  {C\  . . .  CF}. 

To  measure  the  alignment  between  a  predicted  circle  C  and  a  ground-truth  circle  C,  we 
compute  the  Balanced  Error  Rate  (BER)  between  the  two  circles  (Chen  and  Lin,  2006), 


FFr?(rn  1  i  |(?\c| 

BER{c’c)-2{-^cr+^cr 


(22) 


This  measure  assigns  equal  importance  to  false  positives  and  false  negatives,  so  that  trivial 
or  random  predictions  incur  an  error  of  0.5  on  average.  Such  a  measure  is  preferable  to  the 
0/1  loss  (for  example),  which  assigns  extremely  low  error  to  trivial  predictions.  We  also 
report  the  F\  score,  which  we  find  produces  qualitatively  similar  results. 


8.2  Aligning  predicted  and  ground-truth  circles 

Since  we  do  not  know  the  correspondence  between  circles  in  C  and  C,  we  compute  the 
optimal  match  via  linear  assignment  by  maximizing: 


1 

max_  — 
/:C— >C  | /| 


E  (1  -  BER(CJ(C))), 


(23) 


Cedom(/) 


where  /  is  a  (partial)  correspondence  between  C  and  C.  That  is,  if  the  number  of  predicted 
circles  \C\  is  less  than  the  number  of  ground-truth  circles  \C\,  then  every  circle  C  6  C 
must  have  a  match  C  G  C,  but  if  \C\  >  \C\,  we  do  not  incur  a  penalty  for  additional 
predictions  that  could  have  been  circles  but  were  not  included  in  the  ground-truth.  We 
use  established  techniques  to  estimate  the  number  of  circles,  so  that  none  of  the  baselines 
suffers  a  disadvantage  by  mispredicting  K  =  \C\. 

In  the  case  of  Facebook  (where  we  have  ‘complete’  ground-truth,  in  the  sense  that 
survey  participants  ostensibly  label  every  circle),  our  method  ought  to  penalize  predicted 
circles  that  do  not  appear  in  the  ground-truth.  A  simple  penalty  would  be  to  assign  an 
error  of  0.5  (i.e. ,  that  of  a  random  prediction)  to  additional  circles  in  the  case  of  Facebook. 
However,  in  our  experience,  our  method  did  not  overpredict  the  number  of  circles  in  the  case 
of  Facebook:  on  average,  users  identified  19  circles,  whereas  using  the  Bayesian  Information 


Criterion  described  in  Section  3.1,  our  method  never  predicted  K  >  10.  In  practice  this 


means  that  in  the  case  of  Facebook,  we  always  penalize  all  predictions.  Again  we  note  that 
the  process  of  choosing  the  number  of  circles  using  the  BIC  is  a  standard  procedure  from 


the  literature  (Airoldi  et  ah,  2008)  Handcock  et  ah,  2007a  Volinsky  and  Raftery,  2000) 


whose  merit  we  do  not  assess  in  this  paper. 
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Network  Modularity.  Although  for  our  algorithm,  and  other  probabilistic  baselines, 
we  shall  choose  the  number  of  communities  using  the  Bayesian  Information  Criterion  as 
described  in  Section  13. 11  another  standard  criterion  used  to  determine  the  number  of  com¬ 


munities  in  a  network  is  the  modularity  (Newman,  2006). 


The  Bayesian  Information  Criterion  has  the  advantage  that  it  allows  for  overlapping 
communities,  whereas  the  modularity  does  not  (i.e. ,  it  assumes  all  communities  are  dis¬ 
joint);  it  is  for  this  reason  that  we  chose  the  BIC  to  choose  K  for  our  algorithm.  On 
the  other  hand,  the  Bayesian  Information  Criterion  can  only  be  computed  for  probabilistic 
models  (i.e.,  models  that  associate  a  likelihood  with  each  prediction),  whereas  the  modu¬ 
larity  has  no  such  restriction.  For  this  reason,  we  shall  use  the  modularity  to  choose  I\  for 
non-probabilistic  baselines. 

The  modularity  essentially  measures  the  extent  to  which  clusters  in  a  network  have 
dense  internal,  but  sparse  external,  connections  (Newman,  2003).  If  is  the  fraction  of 


edges  in  the  network  that  connect  vertices  in  Ci  to  vertices  in  Cj,  then  the  modularity  is 
defined  as 

K  (  K 

Q(K )  =  i  —  ev  }  •  (24) 


i=  1 


3= 1 


We  then  choose  K  so  that  the  modularity  is  maximized. 


8.3  Baselines 

We  considered  a  wide  number  of  baseline  methods,  including  those  that  consider  only 
network  structure,  those  that  consider  only  profile  information,  and  those  that  consider 
both. 

Mixed  Membership  Stochastic  Block  Models.  (Airoldi  et  al.,  2008).  This  method 


detects  communities  based  only  on  graph  structure;  the  output  is  a  stochastic  vector  for 
each  node  encoding  partial  memberships  to  each  community.  The  optimal  number  of 
communities  K  is  determined  using  the  Bayesian  Information  Criterion  as  described  in 


(eq.  11 ).  This  model  is  similar  to  those  of  (Liu  et  al. ,  2009)  and  (Chang  and  Blei,  2009),  the 
latter  of  which  includes  the  implementation  of  MMSB  that  we  used.  Since  we  require  ‘hard’ 
memberships  for  evaluation,  we  assign  a  node  to  a  community  if  its  partial  membership  to 
that  community  is  positive. 

Block-LDA.  ( Balasubramanyan  and  Cohen,  2011 ).  This  method  is  similar  MMSB,  except 


that  it  allows  nodes  to  be  augmented  with  side  information  in  the  form  of  ‘documents’. 
For  our  purposes,  we  generate  ‘documents’  by  treating  aspects  of  user  profiles  as  words  in 
a  bag-of- words  model. 

K-means  clustering.  (MacKay  2003).  Just  as  MMSB  uses  only  the  graph  structure,  K- 


means  clustering  ignores  the  graph  structure  and  uses  only  node  features  (for  node  features 
we  again  use  a  bag-of- words  model).  Here,  we  choose  K  so  as  to  maximize  the  modularity 
of  C,  as  defined  in  (eq.  [24]) . 

Hierarchical  Clustering.  (Johnson,  1967).  This  method  builds  a  hierarchy  of  clusters. 


Like  K-means,  this  method  form  clusters  based  only  on  node  profiles,  but  ignores  the 
network. 

Link  Clustering.  (Ahn  et  al.  2010).  Conversely,  this  method  uses  network  structure, 


but  ignores  node  features  to  construct  hierarchical  communities  in  networks. 
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Table  1:  Baselines 


Algorithm 

network 

structure? 

node/edge 

features? 

overlapping 

communities? 

hard  member¬ 
ships? 

MMSB 

Yes 

No 

Yes 

No 

Block-LDA 

Yes 

Yes 

Yes 

No 

K- means 

No 

Yes 

No 

Yes 

Hierarchical  Clustering 

No 

Yes 

Yes 

Yes 

Link  Clustering 

Yes 

No 

No 

Yes 

Clique  Percolation 

Yes 

No 

Yes 

Yes 

Low-Rank  Embedding 

Yes 

Yes 

No 

Yes 

Multi- Assignment  Clustering 

No 

Yes 

Yes 

Yes 

Our  algorithm 

Yes 

Yes 

Yes 

Yes 

Clique  Percolation.  (Palla  et  al.  2005).  This  method  also  uses  only  network  structure, 


and  builds  communities  from  the  union  of  small,  densely-connected  sub-communities. 


Low- Rank  Embedding.  (Yoshida  2010).  Uses  both  graph  structure  and  node  similarity 
information,  but  does  not  perform  any  learning.  We  adapt  an  algorithm  described  by 


(Yoshida,  2010),  where  node  similarities  are  based  on  the  cosine  distance  between  profile 
bags-of-words.  After  our  features  are  embedded  into  a  low-dimensional  space,  we  again 
use  K- means  clustering  to  detect  communities,  again  choosing  K  so  as  to  maximize  the 
modularity. 

(Frank  et  al 


2012).  Like  ours,  this  method  predicts 


Multi- Assignment  Clustering. 

hard  assignments  to  multiple  clusters,  though  it  does  so  without  using  the  network  struc¬ 
ture. 

The  above  methods  (and  our  own)  are  summarized  in  Table  [lj  Of  the  eight  baselines 
highlighted  above  we  report  the  three  whose  overall  performance  was  the  best,  namely 
Block-LDA  ( Balasubramanyan  and  Cohen,  2011 )  (which  slightly  outperformed  mixed  mem¬ 


bership  stochastic  block  models  (Airoldi  et  al. 

2010|),  and  Multi- Assignment  Clustering  (Frank  et  al. 


2008)),  Low- Rank  Embedding  (Yoshida 


2012). 


8.4  Performance  on  Facebook,  Google+,  and  Twitter  Data 

Figure  |4]  shows  results  on  our  Facebook,  Google+,  and  Twitter  data.  The  largest  circles 
from  Google+  were  excluded  as  they  exhausted  the  memory  requirements  of  many  of  the 


baseline  algorithms.  Circles  were  aligned  as  described  in  (eq.  23 ),  with  the  number  of  circles 
K  determined  as  described  in  Section  [3J  For  non-probabilistic  baselines,  we  chose  K  so 
as  to  maximize  the  modularity,  as  described  in  (eq.  24).  In  terms  of  absolute  performance 
our  best  model  q i*1  achieves  BER  scores  of  0.84  on  Facebook,  0.72  on  Google-I-  and  0.70  on 
Twitter  ( F\  scores  are  0.59,  0.38,  and  0.34,  respectively).  The  lower  F\  scores  on  Google-I- 
and  Twitter  are  explained  by  the  fact  that  many  circles  have  not  been  maintained  since 
they  were  initially  created:  we  achieve  high  recall  (we  recover  the  friends  in  each  circle),  but 
at  low  precision  (we  recover  additional  friends  who  appeared  after  the  circle  was  created). 

Comparing  our  method  to  baselines  we  notice  that  we  outperform  all  baselines  on  all 
datasets  by  a  statistically  significant  margin.  Compared  to  the  nearest  competitors,  our 
best  performing  features  q i1  improve  on  the  BER  by  43%  on  Facebook,  26%  on  Google+, 
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and  16%  on  Twitter  (improvements  in  terms  of  the  F\  score  are  similar).  Regarding  the 
performance  of  the  baseline  methods,  we  note  that  good  performance  seems  to  depend 
critically  on  predicting  hard  memberships  to  multiple  circles,  using  a  combination  of  node 
and  edge  information;  none  of  the  baselines  from  Table [l] exhibit  precisely  this  combination, 
a  shortcoming  our  model  addresses. 

Both  of  the  features  we  propose  (friend-to-friend  features  q i1  and  friend-to-user  features 
(p2)  perform  similarly,  revealing  that  both  schemes  ultimately  encode  similar  information, 
which  is  not  surprising,  since  users  and  their  friends  have  similar  profiles.  Using  the  ‘com¬ 
pressed’  features  ip1  and  ip2  does  not  significantly  impact  performance,  which  is  promising 
since  they  have  far  lower  dimension  than  the  full  features;  what  this  reveals  is  that  it  is 
sufficient  to  model  categories  of  attributes  that  users  have  in  common  (e.g.  same  school, 
same  town),  rather  than  the  attribute  values  themselves. 

We  found  that  all  algorithms  perform  significantly  better  on  Facebook  than  on  Google+ 
or  Twitter.  There  are  a  few  explanations:  Firstly,  our  Facebook  data  is  complete ,  in  the 
sense  that  survey  participants  manually  labeled  every  circle  in  their  ego-networks,  whereas 
in  other  datasets  we  only  observe  publicly-visible  circles,  which  may  not  be  up-to-date. 
Secondly,  the  26  profile  categories  available  from  Facebook  are  more  informative  than  the 
6  categories  from  Google+,  or  the  tweet-based  profiles  we  built  from  Twitter.  A  more  basic 
difference  lies  in  the  nature  of  the  networks  themselves:  edges  in  Facebook  encode  mutual 
ties,  whereas  edges  in  Google+  and  Twitter  encode  follower  relationships,  which  changes 


the  role  that  circles  serve  (Wu  et  ah,  2011).  The  latter  two  points  explain  why  algorithms 


that  use  either  edge  or  profile  information  in  isolation  are  unlikely  to  perform  well  on  this 
data. 


8.5  Qualitative  Analysis 

Next  we  examine  the  output  of  our  model  in  greater  detail.  Figure  [5]  shows  results  of  our 
unsupervised  method  on  example  ego- networks  from  Facebook  and  Google+.  Different 
colors  indicate  true-,  false-  positives  and  negatives.  Our  method  is  correctly  able  to  identify 
overlapping  circles  as  well  as  sub-circles  (circles  within  circles). 

Figure  [6]  shows  parameter  vectors  learned  for  four  circles  for  a  particular  Facebook 
user.  Positive  weights  indicate  properties  that  users  in  a  particular  circle  have  in  common. 
Notice  how  the  model  naturally  learns  the  social  dimensions  that  lead  to  a  social  circle. 
Moreover,  the  first  parameter  that  corresponds  to  a  constant  feature  ‘1’  has  the  highest 
weight;  this  reveals  that  membership  to  the  same  community  provides  the  strongest  signal 
that  edges  will  form,  while  profile  data  provides  a  weaker  (but  still  relevant)  signal. 


8.6  Circle  Maintenance 

Next  we  examine  the  problem  of  adding  new  users  to  already-defined  ego-networks,  in 
which  complete  circles  have  already  been  provided.  For  evaluation,  we  suppress  a  single 
user  u  from  a  user’s  ego-network,  and  learn  the  model  parameters  ©  that  best  fit  G  \  {u} 
and  C\{it}.  Our  goal  is  then  to  recover  the  set  of  communities  to  which  the  node  u  belongs, 
as  described  in  Section  4.1  Again  we  report  the  Balanced  Error  Rate  and  F\  score  between 
the  ground-truth  and  the  predicted  set  of  community  memberships  for  u.  We  use  all  of 
each  users’  circles  for  training,  up  to  a  maximum  of  fifteen  circles.  This  experiment  is 
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repeated  for  10  random  choices  of  the  user  u  for  each  ego-network  in  our  dataset. 

As  a  baseline  we  compare  the  performance  of  our  algorithm  to  that  of  a  fully-supervised 
Support  Vector  Machine  (SVM)  model.  For  each  community  C^,  we  train  a  binary  classi¬ 
fier  that  discriminates  members  from  non-members  based  on  their  node  features.  Binary 
classifications  are  then  made  for  each  community  independently. 

Performance  on  this  task  is  shown  in  Figure  [7J  On  Facebook,  Google+,  and  Twitter 
our  best  performing  features  q i1  achieve  Balanced  Error  Rates  of  0.30,  0.34,  and  0.34 
(respectively),  and  F\  scores  of  0.38,  0.59,  and  0.54.  The  SVM  model  achieves  better 
accuracy  when  rich  node  features  are  available  (which  is  the  case  for  Facebook),  though  it 
fails  to  make  use  of  edge  information,  and  does  not  account  for  interdependencies  between 
circles.  This  proves  critical  in  the  case  of  Google-I-  and  Twitter,  where  node  information 
alone  proves  uninformative. 

8.7  Semi-Supervised  Circle  Prediction 

Our  next  task  is  to  identify  circles  using  a  form  of  weak  supervision  provided  by  the  user, 
in  the  form  of  seed  nodes  as  described  in  Section  [4.2|  In  this  setting,  the  user  provides  S 
seed  nodes  for  each  of  K  circles  that  they  wish  to  identify.  For  evaluation,  we  select  the  K 
circles  to  be  identified  and  the  S  seed  nodes  uniformly  at  random. 

Without  seed  nodes  (as  in  our  initial  experiments),  the  circles  that  are  automatically 
identified  by  our  algorithm  may  be  quite  different  from  those  identified  once  seed  nodes  are 
added.  Similarly,  there  may  be  many  circles  containing  the  same  seed  nodes,  meaning  that 
different  solutions  may  be  chosen  for  different  values  of  S.  Thus  it  is  difficult  to  compare 
the  loss  of  (eq.  [23])  with  and  without  seed  nodes.  To  address  this,  we  modify  the  matching 
objective  of  (eq.  |23[)  so  that  the  K  circles  randomly  selected  for  seeding  must  be  the  same 
as  those  matched  when  evaluating  the  loss.  Thus  the  loss  is  always  evaluated  on  the  same 
I\  circles  for  every  number  of  seed  nodes  S  E  {0  . . .  10}.  Note  also  that  for  each  value  of  K, 
performance  is  only  evaluated  on  those  ego- networks  with  at  least  K  ground-truth  circles. 

Figure  [8]  shows  the  performance  of  our  algorithm  for  different  numbers  of  seed  nodes 
S  E  {0 . . .  10}  and  different  numbers  of  circles  K  E  {1 ...  5}.  The  same  results  in  terms 
of  the  F\  score  are  qualitatively  similar  and  are  omitted  for  brevity.  We  find  that  for  all 
values  of  K ,  adding  seed  nodes  increases  the  accuracy  significantly,  though  the  effect  is 
most  pronounced  when  the  number  of  circles  that  the  user  wishes  to  identify  is  small. 

Curiously,  we  find  that  while  larger  values  of  K  lead  to  better  prediction  when  there 
are  no  seeds,  the  opposite  is  true  when  there  are  many  seeds.  The  former  behavior  may  be 
explained  by  the  simple  fact  that  larger  values  of  K  are  better  able  to  fit  the  data,  though 
the  latter  behavior  is  more  enigmatic.  Pleasingly,  assuming  that  a  user  wishes  to  identify 
only  a  small  number  of  circles  at  a  time,  then  they  can  do  so  with  very  few  seeds:  for  small 
K,  most  of  the  benefit  is  gained  once  only  two  or  three  seeds  are  provided. 

8.8  Scalability  Analysis 

Figure  [9]  examines  how  our  algorithm  scales  with  the  size  of  an  ego-network.  Here  we  use 
the  Markov-Chain  Monte-Carlo  (MCMC)  version  of  our  algorithm  described  in  Section  [7J 
Figure  [9]  shows  the  total  time  taken  to  predict  different  numbers  of  circles  in  differently 
sized  ego-networks.  Since  the  performance  of  our  MCMC  algorithm  is  a  function  of  the 
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number  of  circles  K  and  the  feature  dimensionality  F,  we  fix  the  feature  dimensionality 
at  F  =  10  for  all  ego-networks,  using  the  ten  most  common  features  that  appear  in  each 
ego-network  using  the  ‘friend-to- friend’  features  4>l . 

For  comparison,  Figure  [9]  shows  the  running  time  of  inference  using  QPBO  as  described 
in  Section  [3j  Although  the  two  algorithms  are  competitive  for  up  to  a  few  hundred  nodes, 
the  QPBO  algorithm  becomes  intractable  for  networks  of  around  1000  nodes,  since  it 
requires  us  to  optimize  a  probability  distribution  defined  on  complete  graphs  (in  practice, 
in  order  to  apply  the  QPBO  algorithm  in  the  previous  experiments,  we  did  not  construct 
complete  graphs,  but  rather  included  only  those  edges  whose  influence  on  the  likelihood 
was  maximal). 

Although  this  version  of  the  algorithm  is  not  particularly  efficient  for  small  networks 
(identifying  K  =  10  circles  on  an  ego  network  with  1000  nodes  requires  around  one  hour), 
it  has  the  advantage  that  it  is  easily  able  to  scale  to  the  largest  ego-networks  that  are  ever 
encountered.  For  very  large  networks,  the  algorithm  is  able  to  take  advantage  of  the  fact 
that  many  nodes  with  the  same  features  and  community  memberships  can  be  ‘collapsed’,  so 
that  the  running  time  increases  only  modestly  between  2500  and  5000  node  ego- networks. 

Figure  10  shows  the  accuracy  of  our  MCMC  algorithm  in  terms  of  the  Balanced  Error 
Rate  and  F\  score.  We  note  that  the  best  performance  of  our  algorithm  is  obtained  on 
reasonably  small  ego- networks,  though  in  practice  small  networks  account  for  the  vast 
majority  of  our  data.  Note  that  the  results  for  any  particular  value  of  K  are  slightly  worse 
than  those  reported  in  Figure  [4j  since  we  are  not  selecting  K  using  the  BIC  described  in 
Section  3.1  Although  performance  clearly  degrades  for  large  ego-networks,  it  remains  an 


open  question  whether  this  is  due  to  the  difficulty  of  optimization  on  large  networks,  or 
simply  due  to  the  fact  that  our  model  assumptions  become  increasingly  violated  as  large 
networks  become  less  ‘community- like’. 


9  Discussion  and  Future  Work 

We  have  modeled  circle  detection  as  a  problem  that  can  be  solved  independently  for  each 
user.  In  practice  this  assumption  is  advantageous,  as  it  allows  us  to  deal  with  several 
small  problems  independently,  using  sophisticated  models  that  could  not  easily  scale  to 
networks  with  millions  of  nodes.  However,  it  is  possible  that  circles  could  be  more  ac¬ 
curately  predicted  by  exploiting  relationships  between  the  circles  of  multiple  users.  For 
example,  if  a  user  has  a  ‘Stanford’  circle  in  their  ego-network,  it  is  highly  likely  that  users 
belonging  to  that  circle  will  also  have  Stanford  circles  within  their  own  ego-networks.  Al¬ 
ternately,  if  a  Stanford  community  could  be  detected  across  the  entire  Facebook,  Google+, 
or  Twitter  network,  then  a  user’s  ‘Stanford’  circle  might  simply  be  the  intersection  of  their 
ego-network  with  that  community.  Although  studying  such  models  is  an  appealing  avenue 
for  future  work,  it  is  unfortunately  not  possible  using  our  data,  where  we  do  not  have 
access  to  complete  network  information. 

Although  we  developed  algorithms  that  scale  to  the  largest  ego-networks  that  we  en¬ 
countered,  we  find  that  the  best  performance  occurs  on  ego-networks  with  up  to  a  few 
hundred  nodes,  but  degrades  significantly  for  networks  with  more  than  1000.  It  remains  to 
be  seen  whether  this  is  a  shortcoming  of  our  algorithm  (due  to  the  fact  that  optimization 
is  more  difficult  for  large  networks) ,  or  whether  the  assumptions  of  our  model  simply  break 
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down  at  large  scales.  Our  fundamental  assumption  that  circles  will  be  made  up  of  close- 
knit  groups  of  friends  with  common  properties  seems  like  a  better  fit  to  networks  with  at 
most  a  few  hundred  nodes. 

We  also  found  that  performance  on  even  the  largest  Facebook  networks  (i.e.,  over  1000 
friends)  was  better  than  that  obtained  on  small  networks  from  Google-I-  and  Twitter.  This 
suggests  that  it  is  not  merely  the  size  of  the  networks  that  causes  our  model  assumptions  to 
become  violated,  but  rather  the  very  nature  of  the  networks  themselves  (in  addition  to  the 
differences  in  the  ground-truth  already  mentioned).  Naturally,  a  circle  containing  members 
of  the  same  squash  team  (as  we  find  on  Facebook)  is  fundamentally  different  from  a  circle 
containing  presidential  candidates  (as  we  find  on  Google+).  It  remains  to  design  a  circle 
detection  algorithm  that  is  tailored  for  networks  with  asymmetric  following  relationships. 

10  Conclusion 

‘Circles’  allow  us  to  organize  the  overwhelming  volumes  of  data  generated  by  our  personal 
social  networks,  though  they  are  laborious  to  construct  manually.  We  have  designed  an 
algorithm  to  automatically  detect  circles  in  ego-networks,  which  we  evaluated  on  a  dataset 
of  1,143  ego-networks  and  5,541  ground-truth  circles  obtained  from  Facebook,  Google+, 
and  Twitter.  We  find  in  such  data  circles  that  are  disjoint,  overlapping,  and  hierarchically 
nested,  and  design  our  model  with  such  behavior  in  mind.  Our  model  is  unsupervised,  but 
can  also  make  use  of  weakly-labeled  data  that  may  be  available  in  practice.  Experiments 
reveal  that  social  circles  can  be  accurately  detected  using  a  combination  of  both  network 
and  profile  information. 
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Figure  3:  Feature  construction.  Profiles  are  tree-structured,  and  we  construct  features  by 
comparing  paths  in  those  trees.  Examples  of  trees  for  two  users  x  (blue)  and  y  (pink)  are 
shown  at  top.  Two  schemes  for  constructing  feature  vectors  from  these  profiles  are  shown  at 
bottom:  (1)  (bottom  left)  we  construct  binary  indicators  measuring  the  difference  between 
leaves  in  the  two  trees,  e.g.  ‘work— ^position— >- Crypt  analyst’  appears  in  both  trees.  (2) 
(bottom  right)  we  sum  over  the  leaf  nodes  in  the  first  scheme,  maintaining  the  fact  that 
the  two  users  worked  at  the  same  institution,  but  discarding  the  identity  of  that  institution. 
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Figure  4:  Performance  on  Facebook,  Google-f-,  and  Twitter,  in  terms  of  the  Balanced  Error 
Rate  (top),  and  the  F\  score  (bottom).  Higher  is  better.  Error  bars  show  standard  error. 
The  improvement  of  our  best  features  cp 1  compared  to  the  nearest  competitor  are  significant 
at  the  1%  level  or  better. 
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Figure  5:  Top:  Three  detected  circles  on  a  small  ego-network  from  Facebook,  compared  to 
three  ground-truth  circles  (BER  ~  0.81).  Blue  nodes:  true  positives.  Grey:  true  negatives. 
Red:  false  positives.  Yellow:  false  negatives.  Our  method  correctly  identifies  the  largest 
circle  (left),  a  sub-circle  contained  within  it  (center),  and  a  third  circle  that  significantly 
overlaps  with  it  (right).  Bottom:  Four  detected  circles  on  ego-networks  from  Google+ 
(BER  ~  0.73).  Green  nodes  in  the  two  right  networks  show  additional  detected  circles, 
whose  accuracy  cannot  be  evaluated  as  we  only  observed  two  circles  in  the  ground-truth. 
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Figure  6:  Parameter  vectors  of  four  communities  for  a  particular  Facebook  user.  The 
top  four  plots  show  ‘complete’  features  (j)1,  while  the  bottom  four  plots  show  ‘compressed’ 
features  ip1  (in  both  cases,  BER  ~  0.78).  For  example  the  former  features  encode  the  fact 
that  members  of  a  particular  community  tend  to  speak  German,  while  the  latter  features 
encode  the  fact  that  they  speak  the  same  language.  (Personally  identifiable  annotations 
have  been  suppressed.) 
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Figure  7:  Accuracy  of  assigning  a  new  node  to  already-existing  circles.  Although  a  fully- 
supervised  Support  Vector  Machine  gives  accurate  results  on  Facebook  (where  node  fea¬ 
tures  are  highly  informative),  our  model  yields  far  better  results  on  Google-I-  and  Twitter 
data. 
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Figure  8:  Number  of  seeds  versus  accuracy  (1  -  Balanced  Error  Rate)  for  different  numbers 
of  circles  K.  For  each  of  the  K  circles  being  identified,  the  user  provides  the  same  number 
of  seeds.  Although  providing  additional  seeds  is  generally  beneficial  to  performance  for  all 
K,  the  benefit  is  most  pronounced  when  the  number  of  circles  to  be  identified  is  small. 
Results  in  terms  of  the  F\  score  are  qualitatively  similar  and  are  omitted  for  brevity. 


Figure  9:  Running  time  of  our  Markov  Chain  Monte  Carlo  (MCMC)  algorithm  for  different 
ego- network  sizes  and  different  values  of  K  (the  number  of  circles  to  be  detected).  For 
comparison,  our  previously  described  inference  algorithm  (based  on  QPBO  (Rother  et  ah, 
2007|>)  is  shown  for  K  =  10. 
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Figure  10:  Accuracy  of  our  Markov  Chain  Monte  Carlo  (MCMC)  algorithm,  in  terms  of 
the  Balanced  Error  Rate  (top),  and  the  F\  score  (bottom). 
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